Histogram-Based Method for Effective Initialization of the K-Means Clustering Algorithm

نویسندگان

  • Caroline Gingles
  • M. Emre Celebi
چکیده

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, this algorithm is highly sensitive to the initial selection of the cluster centers. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have superlinear complexity in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the order in which the data points are processed. These methods are generally unreliable in that the quality of their results is unpredictable. In this paper, we propose a linear, deterministic, and order-invariant initialization method based on multidimensional histograms. Experiments on a diverse collection of data sets from the UCI Machine Learning Repository demonstrate the superiority of our method over the well-known maximin method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved COA with Chaotic Initialization and Intelligent Migration for Data Clustering

A well-known clustering algorithm is K-means. This algorithm, besides advantages such as high speed and ease of employment, suffers from the problem of local optima. In order to overcome this problem, a lot of studies have been done in clustering. This paper presents a hybrid Extended Cuckoo Optimization Algorithm (ECOA) and K-means (K), which is called ECOA-K. The COA algorithm has advantages ...

متن کامل

Modified Convex Data Clustering Algorithm Based on Alternating Direction Method of Multipliers

Knowing the fact that the main weakness of the most standard methods including k-means and hierarchical data clustering is their sensitivity to initialization and trapping to local minima, this paper proposes a modification of convex data clustering  in which there is no need to  be peculiar about how to select initial values. Due to properly converting the task of optimization to an equivalent...

متن کامل

A Hybrid Data Clustering Algorithm Using Modified Krill Herd Algorithm and K-MEANS

Data clustering is the process of partitioning a set of data objects into meaning clusters or groups. Due to the vast usage of clustering algorithms in many fields, a lot of research is still going on to find the best and efficient clustering algorithm. K-means is simple and easy to implement, but it suffers from initialization of cluster center and hence trapped in local optimum. In this paper...

متن کامل

An automatic histogram-based initializing algorithm for K-means clustering in CT

K-means clustering [1] has been widely used in various applications. One intrinsic limitation in K-means clustering is that the choice of initial clustering centroids may highly influence the performance of the algorithm. Some existing K-means initialization algorithms could generally achieve good results. However, in certain cases, such as CT images that contain several materials with similar ...

متن کامل

Persistent K-Means: Stable Data Clustering Algorithm Based on K-Means Algorithm

Identifying clusters or clustering is an important aspect of data analysis. It is the task of grouping a set of objects in such a way those objects in the same group/cluster are more similar in some sense or another. It is a main task of exploratory data mining, and a common technique for statistical data analysis This paper proposed an improved version of K-Means algorithm, namely Persistent K...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014